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Abstract. Monte Carlo methods play an important role in scientific computation, 
especially when problems have a vast phase space. In this lecture an introduction 
to the Monte Carlo method is given. Concepts such as Markov chains, detailed 
balance, critical slowing down, and ergodicity, as well as the Metropolis algorithm are 
explained. The Monte Carlo method is illustrated by numerically studying the critical 
behavior of the two-dimensional Ising ferromagnet using finite-size scaling methods. 
Furthermore, advanced Monte Carlo methods are described (e.g., the Wolff cluster 
algorithm and parallel tempering Monte Carlo) and illustrated with nontrivial models 
from the physics of glassy systems. 
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1 Introduction 

The Monte Carlo method in computational physics is possibly one of the most im- 
portant numerical approaches to study problems spanning all thinkable scientific dis- 
ciplines. The idea is seemingly simple: Randomly sample a volume in d-dimensional 
space to obtain an estimate of an integral at the price of a statistical error. For 
problems where the phase space dimension is very large — this is especially the case 
when the dimension of phase space depends on the number of degrees of freedom — the 
Monte Carlo method outperforms any other integration scheme. The difficulty lies in 
smartly choosing the random samples to minimize the numerical effort. 

The term Monte Carlo method was coined in the 1940s by physicists S. Ulam, 
E. Fermi, J. von Neumann, and N. Metropolis (amongst others) working on the nu- 
clear weapons project at Los Alamos National Laboratory. Because random numbers 
(similar to processes occurring in a casino, such as the Monte Carlo Casino in Monaco) 
are needed, it is believed that this is the source of the name. Monte Carlo methods 
were central to the simulations done at the Manhattan Project, yet mostly hampered 
by the slow computers of that era. This also spurred the development of fast random 
number generators, discussed in another lecture of this series. 

In this lecture, focus is placed on the standard Metropolis algorithm to study prob- 
lems in statistical physics, as well as a variation known as exchange or parallel tem- 
pering Monte Carlo that is very efficient when studying problems in statistical physics 
with complex energy landscapes (e.g., spin glasses, proteins, neural networks) [1]. In 
general, continuous phase transitions are discussed. First-order phase transitions are, 
however, beyond the scope of these notes. 

2 Monte Carlo integration 

The motivation for Monte Carlo integration lies in the fact that most standard in- 
tegration schemes fail for high-dimensional integrals. At the same time, the space 
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dimension of the phase space of typical physical systems is very large. For exam- 
ple, the phase space dimension for N classical particles in three space dimensions 
is d = 6N (three coordinates and three momentum components are needed to fully 
characterize a particle) . This is even worse for the case of classical Ising spins (dis- 
cussed below) which can take the values ±1. In this case the phase space dimension 
is 2^, a number that grows exponentially fast with the number of spins! Therefore, 
integration schemes such as Monte Carlo methods, where the error is independent of 
the space dimension, are needed. 

2.1 Traditional integration schemes 

Before introducing Monte Carlo integration, let us review standard integration 
schemes to highlight the advantages of random sampling methods. In general, the 
goal is to compute the following one- dimensional integral 



Traditionally, one partitions the interval [a,b] into M slices of width S = {b — a)/M 
and then performs a fcth order interpolation of the function f{x) for each interval to 
approximate the integral as a discrete sum (see Fig. [T]). For example, to first order, 
one performs the midpoint rule where the area of the Ith slice is approximated by a 
rectangle of width 6 and height f[{xi + xi+i)/2]. It follows that 



For M — )■ oo the discrete sum converges to the integral of f{x). Convergence can be 
improved by replacing the rectangle with a linear interpolation between xi and a;/+i 
(trapezoidal rule) or a weighted quadratic interpolation (Simpson's rule) [62! • One 
can show that the error made due to the approximation of the function is proportional 
to ^ for the midpoint rule if the function is evaluated at one of the interval's 

edges (in the center as shown above ^ M~^), ~ for the trapezoidal rule, and 

^ M^'^ for Simpson's rule. The convergence of the midpoint rule can thus be slow 
and the method should be avoided. 

A problem arises when a multi-dimensional integral needs to be computed. In this 
case one can show that, for example, the error of Simpson's rule scales as ^ M"^/'' 
because each space component has to be partitioned independently. Clearly, for space 
dimensions larger than 4 convergence becomes very slow. Similar arguments apply for 
any other traditional integration scheme where the error scales as '-^ M"'': if applied 
to a c?-dimensional integral the error scales ~ M~'^/'^. 

2.2 Simple and Markov-chain sampling 

One way to overcome the limitations imposed by high-dimensional volumes is simple 
sampling Monte Carlo. A simple analogy is to determine the area of a pond by 




(1) 



M-l 




(2) 



3 



Monte Carlo Methods (Katzgraber) 



fix) 



b X 

Figure 1: Illustration of the midpoint rule. The integration interval [a, h] is divided 
into M slices, the area of each slice approximated by the width of the slice, 5 — 
{b — a)/M, times the function evaluated at the midpoint of each slice. 

throwing rocks. After enclosing the pond with a known area (e.g., a rectangle) and 
having enough beer or wine ]2[ , pebbles are randomly thrown into the enclosed area. 
The ratio of stones in the pond and the total number of thrown stones is a simple 
sampling statistical estimate for the area of the pond, see Fig. [2j 



Figure 2: Illustration of simple-sampling 
Monte Carlo integration. An unknown area 
(pond) is enclosed by a rectangle of known 
area A = ah. By randomly sampling the 
area with pebbles, a statistical estimate of 
the pond's area can be computed. 



b 

A slightly more "scientific" example is to compute tt by applying Monte Carlo 
integration to the unit circle. The area of the unit circle is given by Ao — Trr^ with 
r = 1; the top right quadrant can be enclosed by a square of size r and area = 
(see Fig. [3]). An estimate of tt can be accomplished with the following pseudo-code 
algorithm [3] that performs a simple sampling of the top-right quadrant: 

1 algorithm simple_pi 



2 initialize n_hits 

3 initialize m_trials 10000 

4 initialize counter 

5 

6 while (counter < m_trials) do 

7 X = rand(0,l) 

8 y = rand(0,l) 

9 if(x**2 + y**2 < 1) 

10 n_hits++ 

11 fi 

12 counter++ 

13 done 

14 



15 return pi = 4*n_hits/m_trials 
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Figure 3: Monte Carlo esti- 
mate of TT by randomly sam- 
pling the unit circle: two ran- 
dom numbers x and y in the 
range [0, 1] are computed. If 
+ 3/^ ^ Ij the resulting point 
is in the unit circle. After M 
trials an estimate of 7r/4 can be 
computed with a statistical er- 
ror ~ M-i/^ 

For each of the m_trials trials we generate two uniform random numbers |62| in the 
interval [0, 1] [with rand(0, 1)] and test in line 9 of the algorithm if these lie in the 
unit circle or not. The counter nJiits is then updated if the resulting number is in 
the circle. In line 15 a statistical estimate of tt is then returned. 

Before applying these ideas to the integration of a function, we introduce the 
concept of a Markov chain [53] . In the simple-sampling approach to estimate the area 
of a pond as presented above, the random pebbles used are independent in the sense 
that a newly-selected pebble to be thrown into the rectangular area in Fig.[2]does not 
depend in any way on the position of the previous pebbles. If, however, the pond is 
very large, it is impossible to throw pebbles randomly from one position. Thus the 
approach is modified: After enough beer you start at a random location (make sure to 
drain the pond first) and throw a pebble into a random direction. You then walk to 
that pebble, pull a new pebble out of a pebble bucket you have with you and repeat 
the operation. This is illustrated in Fig.lH If the pebble lands outside the rectangular 
area, the thrower should go get the outlier and place it on the current position of the 
thrower, i.e., if the move lies outside the sampled area, it is rejected and the last move 
counted twice. Why? This will be explained later and is called detailed balance (see 
p. [T4|) . Basically, it ensures that the Markov chain is reversible. After many beers 
and throws, pebbles are scattered around the rectangular area, with small piles of 
multiple pebbles closer to the boundaries (due to rejected moves). 

Figure 4: Illustration of Markov- 
chain Monte Carlo. The new state 
is always derived from the previous 
state. At each step a pebble is thrown 
in a random direction, the following 
throw has its origin at the landing po- 
sition of the previous one. If a peb- 
ble lands outside the rectangular area 
(cross) the move is rejected and the 
last position recorded twice (double 
circle) . 

Again, these ideas can be used to estimate tt by Markov-chain sampling the unit 
circle. Later, the Metropolis algorithm, which is based on these simple ideas, is 
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introduced in detail using models from statistical physics. The following algorithm 
describes Markov-chain Monte Carlo for estimating tt: 

1 algorithm markov_pi 



2 initialize n_hits 

3 initialize m_trials 10000 

4 initialize x 

5 initialize y 

6 initialize counter 

7 

8 while (counter < m_trials) do 

9 dx = rand(-p,p) 

10 dy = rand(-p,p) 

11 if(|x + dxl < 1 and |y + dyl < 1) 

12 X = X + dx 

13 y = y + dy 

14 f i 

15 if(x**2 + y**2 < 1) 

16 n_hits++ 

17 fi 

18 counter++ 

19 done 

20 



21 return pi = 4*n_hits/m_trials 

The algorithm starts from a given position in the space to be sampled [here (0, 0)] 
and generates the position of the new dot from the position of the previous one. If 
the new position is outside the square, it is rejected (line 11). A careful selection of 
the step size p used to generate random numbers in the range [—p,p] is of importance: 
When p is too small, convergence is slow, whereas if p is too large many moves are 
rejected because the simulation will often leave the \init square. Therefore, a value of 
p has to be selected such that consecutive moves are accepted approximately 50% of 
the time. 

The simple-sampling approach has the advantage over the Markov-chain approach 
in that the different samples are independent and thus not correlated. In the Markov- 
chain approach the new state depends on the previous state. This can be a problem 
since there might be a "memory" associated with this behavior. If this memory is 
large, then the autocorrelation times (i.e., the time it takes the system to forget where 
it was) are large and many moves have to be discarded. Then why even think about 
the Markov-chain approach? Because in the study of physical systems it is generally 
easier to slightly (and randomly) change an existing state than to generate a new state 
from scratch for each step of the calculation. For example, when studying a system 
of A'^ spins it is easier to flip one spin according to a given probability distribution 
than to generate a new configuration from scratch with a pre-determined probability 
distribution. 

Let us apply now these ideas to perform a simple-sampling estimate of the integral 
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of an actual function. As an example, we select a simple function, namely 

fix) = x" ^ / - f{x)dx (3) 

Jo 

with n > — 1. Using simple-sampling Monte Carlo, the integral can be estimated via 
1 algorithm simple_integrate 



2 initialize integral 

3 initialize m_trials 10000 

4 initialize counter 

5 

6 while (counter < m_trials) do 

7 X = rand(0,l) 

8 integral += x**n 

9 counter++ 
10 done 

11 



12 return integral/m_trials 

In line 8 we evaluate the function at the random location and add the result to the 
estimate of the integral, i.e., 

1 

^-mE/(^0, (4) 

i 

where we have set in_t rials = M. To calculate the error of the estimate, we need to 
compute the variance of the function. For this we need to also perform a simple sam- 
pling of the square of the function, i.e., add a line to the code with integral_square 
+= x**(2*n). It then follows |45] for the statistical error of the integral 61 

with 

.1 , M 

(/■>= I [f{xtdx^-Y.^f{x,)f . (6) 

Here Xi are uniformly distributed random numbers. The important detail is that 
Eq. ([5]) does not depend on the space dimension and merely on Af~^/^. This means 
that, for example, for space dimensions d > 8 Monte Carlo sampling outperforms 
Simpson's rule. 

The presented simple-sampling approach has one crucial problem: When in the 
example shown the exponent n is close to —1 or much larger than 1 the variance of 
the function in the interval is large. At the same time, the interval [0, 1] is sampled 
uniformly. Therefore, similar to the estimate of tt, areas which carry little weight 
for the integral are sampled with equal probability as areas which carry most of the 
function's support (see Fig. [5]) . Therefore the integral and error converge slowly. To 
alleviate the situation and shift resources where they are needed most, importance 
sampling is used. 
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Figure 5: Illustration of the simple-sampling 
approach when integrating f{x) — x" with n ^ 
1. The function has most support for x — >■ 1. 
Because random numbers are generated with 
a uniform probability, the whole range [0, 1] is 
sampled equally probable, although for x — > 
the contribution to the integral is small. Thus, 
the integral converges slowly. 




2.3 Importance sampling 

When the variance of the function to be integrated is large, the error [directly depen- 
dent on the variance, see Eq. ([5])] is also large. A cure to the problem is provided 
by generating random numbers that more efficiently sample the area, i.e., distributed 
according to a function p{x) which, if possible, has to fulfill the following criteria: 
First, p{x) should be as close as possible to f{x) and second, generating p-distributed 
random numbers should be easily accomplished. The integral of f{x) can be expressed 
in the following way [using the notation introduced in Eq. ([6])] 

(/) = {f/p)r> = t ^Pi^)dx « i- f; . (7) 

Jo P{x) M ^ p{yi) 

In Eq. I?]) {■ ■ ■)p corresponds to a sampling with respect to p-distributed random 
numbers; yi are also p-distributcd. The advantage of this approach is that the error 
is now given in terms of the variance Var(//p) and, if both f{x) and p{x) are close, 
the variance of f /p is considerably smaller than the variance of /. 

For the case of f{x) = a;" we could, for example, select random numbers dis- 
tributed according to p(x) ^ x^ with £ > n (when n > —1). This means that in 
Fig. M the area around a; < 1 is sampled with a higher probability than the area 
around x ^ 0. Power-law distributed random numbers y can be readily produced 
from uniform random numbers x by inverting the cumulative distribution of p{x), 
i.e., 

yix) = x'/<-^+'^ , e>-l. (8) 

In the next sections the elaborated concepts are applied to problems in (statistical) 
physics. First, some toy models and physical approaches to study the critical behavior 
of statistical models using finite-size simulations are introduced. 

3 Interlude: Statistical mechanics 

In this section the core concepts of statistical mechanics are presented as well as a 
simple model to study phase transitions. Because discussing these topics at length is 
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beyond the scope of these lecture notes, the reader is referred to the vast literature 
in statistical physics, in particular Refs. P^[^[^[551l^[571 [75] . 

3.1 Simple toy model: The Ising model 

Developed in 1925 [35] by Ernst Ising and Wilhelm Lenz, the Ising model has become 
over the decades the drosophila of statistical mechanics. The simplicity yet rich 
behavior of the model makes it the perfect platform to study many magnetic systems 
as well as for testing of algorithms. For simplicity, it is assumed that the magnetic 
moments are highly anisotropic, i.e., they can only point in one space direction. The 
classical spins 5^ = ±1 are placed on a hypercubic lattice with nearest-neighbor 
interactions. Therefore, the Hamiltonian is given by 



The first term in Eq. ([9]) is responsible for the pairwise interaction between two 
neighboring spins Si and Sj. When Jij = —J < 0, the energy is minimized by 
aligning all spins, i.e., ferromagnetic order, whereas when — J > the energy is 
minimized by ensuring that the product over all neighboring spins is negative. In this 
case, staggered antiferromagnetic order is obtained for T — > 0. The "(?, j)" represents 
a sum over nearest-neighbor pairs of spins on the lattice (see Fig. [6]) . The second term 
in Eq. ([9]) represents a coupling to an external field of strength H . Amazingly, this 
simple model captures all interesting phenomena found in the physics of statistical 
mechanics and phase transitions. It is exactly solvable in one space dimension, and in 
two dimensions for H = 0, and thus an excellent test bed for algorithms. Furthermore, 
in space dimensions larger than one it undergoes a finite-temperature transition into 
an ordered state. 

A natural way to quantify the temperature-dependent transition in the ferromag- 
netic case is to measure the magnetization 



of the system. When all spins are aligned, i.e., at low temperatures (below the 
transition), the magnetization is close to unity. For temperatures much larger than the 
transition temperature Tc, spins fluctuate wildly and so, on average, the magnetization 
is zero. Therefore, the magnetization plays the role of an order parameter that is large 
in the ordered phase and zero otherwise. Before the model is described further, some 
basic concepts from statistical physics are introduced. 

3.2 Statistical physics in a nutshell 

It would be beyond the scope of this lecture to discuss in detail statistical mechanics 
of magnetic systems. The reader is referred to the vast literature on the topic [TSjlSSl 
[5Sl[321in3ISZl[ZS] • In this context only the relevant aspects of statistical physics are 
discussed. 




(9) 




(10) 
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Figure 6: Illustration of the 
two-dimensional Ising model 
with nearest-neighbor inter- 
actions. Filled [open] circles 
represent Si = +1 [Si = —1]. 
The spins only interact with 
their nearest neighbors (lines 
connecting the dots). 

Observables In statistical physics, expectation values of quantities such as the 
energy, magnetization, specific heat, etc. — generally called observables — are computed 
by performing a trace over the partition function Z. Within the canonical ensemble 
|33j where the temperature T is fixed, the expectation value or thermal average of an 
observable O is given by 

(0>-|^0(.)e-W^. (11) 

s 

The sum is over all states s in the system, and k represents the Boltzmann constant. 
Z = '^sQ^p[—'H{s)/kT] is the partition function which normalizes the equilibrium 
Boltzmann distribution 

V^^is) = . (12) 

The (• • • ) in Eq. (fTTI) represent a thermal average. One can show that the internal 
energy of the system is given by 

E - {His)) , (13) 

whereas the free energy T is given by 

J'=-kT\nZ. (14) 

Note that all thermodynamic quantities can be computed directly from the partition 
function and expressed as derivatives of the free energy (see Ref. [79] for details). 
Because the partition function is closely related to the Boltzmann distribution, it 
follows that if we can sample observables (e.g., measure the magnetization) with 
states generated according to the corresponding Boltzmann distribution, a simple 
Markov-chain "integration" scheme can be used to produce an estimate. 

Phase transitions Continuous phase transitions [S^ have no latent heat at the 
transition and are thus easier to describe. At a continuous phase transition the free 
energy has a singularity that usually manifests itself via a power-law behavior of the 
derived observables at criticality. The correlation length ^ [33] — which gives us a 
measure of correlations and order in a system — diverges at the transition 

C~|T-r,|-% (15) 
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with V a critical exponent quantifying this divergence and Tc the transition tempera- 
ture. Close enough to the transition (i.e., \r — T^jTc ^ 1) the behavior of observables 
can be well described by power laws. For example, the specific heat cy has a singu- 
larity at Tc with cv '-^ ~ 7c|~", although the exponent a (unlike v) can be both 
negative and positive. The magnetization does not diverge, but has a singular kink 
at Tc, i.e., m^\T - T^f with /3 > 0. 

Using arguments from the renormalization group |25j it can be shown that the crit- 
ical exponents are related via scaling relations. Often (as in the Ising case), only two 
exponents are independent and fully characterize the critical behavior of the model. 
It can be further shown that models in statistical physics generally obey universal be- 
havior (there are some exceptions. . . ), i.e., if the lattice geometry is kept the same, the 
critical exponents only depend on the order parameter symmetry. Therefore, when 
simulating a statistical model, it is enough to determine the location of the transition 
temperature as well as two independent critical exponents to fully characterize the 
universality class of the system. 

Finite-size scaling and the Binder ratio (or "Binder cumulant") How can 

we determine the bulk critical exponents of a system by simulating finite lattices? 
When the systems are not infinitely large, the critical behavior is smeared out. Again, 
using arguments from the renormalization group, one can show that the nonanalytic 
part of a given observable can be described by a finite-size scaling form [63 . For 
example, the finite-size magnetization from a simulation of an Ising system with L'^ 
spins is asymptotically (close to the transition, and for large L) given by 

(mi)^L^/''Af[Li/''(r-r,)] , (16) 

and for the magnetic susceptibility by 

XL-L''/-'C[L^^'^{T-T,)] , (17) 

where close to the transition x ^ \T — Tc\^'' (for the infinite system, L — > cxi) and 

X=^{{m')-{mr) . (18) 

Both Af and C are unknown scaling functions. Equations (|16p and ()17p show that 
when T = T^, data for {rriLl/L^^'^ and xl/L''/'^ simulated for different system sizes 
L should cross in the large-L limit at one point, namely T ~ Tc, provided we use 
the right expressions for and respectively. In reality, there are nonanalytic 
corrections to scaling and so the crossing points between two successive system size 
pairs (e.g., L and 2L) converge to a common crossing point for L — >■ cc that agrees 
with the bulk transition temperature Tc- Performing the finite-size scaling analysis 
with the magnetization or the susceptibility is not very practical, because neither /? 
nor 7 are known a priori. There are other approaches to determine these, but a far 
simpler method is to determine combined quantities that are dimensionless. One such 
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quantity is known as the Binder ratio (or "Binder cumulant" ) [9] given by 

^G[Li/''(r-Te)] . (19) 

The dilfcrcnt factors ensure that g — > 1 for T — > and g -> for T — >■ oo. The 
asymptotic (for large L) scaling behavior of the Binder ratio follows directly from the 
fact that the pre- factors of the moments of the magnetization (m*^ ^ cancel 
out in Eq. ([T9l) . 

The Binder ratio is a dimensionless quantity and so data for different system sizes 
L approximately cross at a putative transition — provided corrections to scaling are 
small. Furthermore, by carefully selecting the correct value of the critical exponent i/, 
the data fall onto a universal curve. Therefore, the method allows for an estimation 
of Tc, as well as the critical exponent ly. This is illustrated in Fig. [7] for the two- 
dimensional Ising model. The left panel shows the Binder ratio as a function of 
temperature for several small system sizes. The vertical dashed line marks the exactly- 
known value of the critical temperature, namely Tc = 2/ln(l + \/2) ~ 2.269 . . . [79^. 
The right panel shows a finite-size scaling analysis of the data for the exact value of 
the critical exponent v. Close to the transition the data fall onto a universal curve, 
showing that = 1 is the correct value of the critical exponent. The two-dimensional 
Ising universality class is fully characterized with a second critical exponent, e.g., 
/3 = 1/8. 
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Figure 7: Left panel: Binder ratio as a function of temperature for the two- 
dimensional Ising model with nearest-neighbor interactions. The data approxi- 
mately cross at one point (the dashed line corresponds to the exactly-known Tc for 
the two-dimensional Ising model) signaling a transition. Right panel: Finite-size 
scaling of the data in the left panel using the known Tc = 2.269 . . . and u = 1. Plot- 
ted are data for the Binder ratio as a function of the scaling variable L}^''\T — Tc]. 
Data for different system sizes fall onto a universal curve suggesting that the pa- 
rameters used are the correct ones. 



Note that other dimensionless quantities, such as the two-point finite-size correla- 
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tion length [6j[58] can also be used with similar results. 



4 Monte Carlo simulations in statistical physics 

In analogy to the importance-sampling Monte Carlo integration of functions discussed 
in Sec. [21 we can use the gained insights to sample the average of an observable in 
statistical physics. In general, as shown in Eq. (fTTI) . 

(^) = ^£,i.(.)/.T ■ (20) 
Equation ((20)) can be trivially extended with a distribution for the states, i.e., 

EJe-«(^)/'=^/7'(s)]7'(s) " ^ ^ 

The approach is completely analogous to the importance sampling Monte Carlo inte- 
gration. If 7^(s) is the Boltzmann distribution [Eq. ([T2|) ] then the factors cancel out 
and we obtain 

(^) = I7E^(^^)' (22) 

i 

where the states Si are now selected according to the Boltzmann distribution. The 
problem now is to find an algorithm that allows for a sampling of the Boltzmann 
distribution. The method is known as the Metropolis algorithm. 

4.1 Metropolis algorithm 

The Metropolis algorithm 52 was developed in 1953 at Los Alamos National Lab 
within the nuclear weapons program mainly by the Rosenbluth and Teller families [4, . 
The article in the Journal of Chemical Physics starts in the following way: 

" The purpose of this paper is to describe a general method, suitable for fast 
electronic computing machines, of calculating the properties of any substance 
which may be considered as composed of interacting individual molecules." 

And they were right. The idea is the following: In order to evaluate Eq. (|20|) we 
generate a Markov chain of successive states si — S2 — > .... The new state is 
generated from the old state with a carefully-designed transition probability 7^(5 ^ s') 
such that it occurs with a probability given by the equilibrium Boltzmann distribution, 
i.e., Voqis) = Z~-^ cxp[—H{s)/kT]. In the Markov process, the state s occurs with 
probability Vk{s) at the fcth time step, described by the master equation 

Vk+i{s) = Vk{s) + J2 - Tis ^ s')rkis)] . (23) 
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The sum is over all states s' and the first term in the sum describes all processes 
reaching state s, while the second term describes all processes leaving state s. The 
goal is that for k ^ oo the probabilities Vk{s) reach a stationary distribution described 
by the Boltzmann distribution. The transition probabilities T can be designed in such 
a way that for Vk{s) = Vcqis), all terms in the sum vanish, i.e., for all s and s' the 
detailed balance condition 

Ti-s' ^ s)7'eq(s') = T{s ^ s')^cq(s) (24) 

must hold. The condition in Eq. (1^^ means that the process has to be reversible. 
Furthermore, when the system has assumed the equilibrium probabilities, the ratio 
of the transition probabilities only depends on the change in energy A'H(s,s') = 
His') -His), i.e., 

I^i',^ - cxp[-(H(s') - His))/kT] = cxp[-AH(s, s')/kT] . (25) 
/ (s' s) 

There are different choices for the transition probabilities T that satisfy Eq. ([25|) . One 
can show that T has to satisfy the general equation 7~(a;)/T(l/x) = x Vx with x = 
exp(— AH/fcT). There are two convenient choices for T that satisfy this condition: 

1. Metropolis (also known as Metropolis-Hastings) algorithm In this case 
Tix) — min(l, x) and so 

, r, if AH < 0; 

Tis^s') = { - (26) 

In Eq. (|26l) . F"^ represents a Monte Carlo time. 



2. Heat-bath algorithm In this case Tix) — x/il + x) corresponding to an 
acceptance probability ^ [1 + exp(A'H(s, s')/kT)]^^. For the rest of this lecture, we 
focus on the Metropolis algorithm. The heat bath algorithm is more efficient when 
temperatures far below the transition temperature are sampled. 

The move between states s and s' can, in principle, be arbitrary. If, however, the 
energies of states s and s' are too far apart, the move will likely not be accepted. For 
the case of the Ising model, in general, a single spin Si is selected and flipped with 
the following probability: 

{F, for Si — — sign(ft,,): 

Pg-2S,/i,/fcT^ for S^ = sign(/i,) . 

where hi = — J2j^i Jij^j + -ff is the effective local field felt by the spin Si. 
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Practical implementation of the Metropolis algorithm A simple pseudo-code 
Monte Carlo program to compute an observable for the Ising model is the following: 

1 algorithm ising_metropolis(T, steps) 



2 initialize starting configuration S 

3 initialize 0=0 

4 

5 forCcounter = 1 . . . steps) do 

6 generate trial state S' 

7 compute p(S -> S',T) 

8 X = rand (0,1) 

9 if (p > x) then 

10 accept S' 

11 fi 

12 

13 += O(S') 

14 done 

15 



16 return 0/steps 

After initialization, in line 6 a proposed state is generated by, e.g., flipping a spin. 

The energy of the new state is computed and henceforth the transition probability 
between states p = T{S — >■ S'). A uniform random number x S [0, 1] is generated. If 
the probability is larger than the random number, the move is accepted. If the energy 
is lowered, i.e., A'H > 0, the spin is always flipped. Otherwise the spin is flipped with 
a probability p. Once the new state is accepted, we measure a given observable and 
record its value to perform the thermal average at a given temperature. For steps 
— > oo the average of the observable converges to the exact value, again with an error 
inversely proportional to the square root of the number of steps. This is the core 
bare-bones routine for the Metropolis algorithm. In practice, several aspects have to 
be considered to ensure that the data produced are correct. The most important, 
autocorrelation and equilibration times, are described below. 

4.2 Equilibration 

In order to obtain a correct estimate of an observable O, it is imperative to ensure 
that one is actually sampling an equilibrium state. Because, in general, the initial con- 
figuration of the simulation can be chosen at random — popular choices being random 
or polarized configuration — the system will have to evolve for several Monte Carlo 
steps before an equilibrium state at a given temperature is obtained. The time r^q 
until the system is in thermal equilibrium is called equilibration time and depends 
directly on the system size (e.g., the number of spins N = L'^) and increases with 
decreasing temperature. In general, it is measured in units of Monte Carlo sweeps 
(MCS), i.e., 1 MCS = N spin updates. 

In practice, all measured observables should be monitored as a function of MCS to 
ensure that the system is in thermal equilibrium. Some observables, such as the 
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m(t) 




Teq t 



Figure 8: Sketch of the equihbration behavior of the magnetization m as a function 
of Monte Carlo time. After a certain time Tcq the data become approximately flat 
and fluctuate around a mean value. Once Tcq has been reached, the system is in 
thermal equilibrium and observables can be measured. 



energy, equilibrate faster than others (e.g., magnetization) and thus the equilibration 
times of all observables measured need to be considered. 

4.3 Autocorrelation times and error analysis 

Because in a Markov chain the new states are generated by modifying the previous 
ones, subsequent states can be highly correlated. To ensure that the measurement of 
an observable O is not biased by correlated configurations, it is important to measure 
the autocorrelation time Tauto that describes the time it takes for two measurements 
to be decorrelated. This means that in a Monte Carlo simulation, after the system 
has been thermally equilibrated, measurements can only be taken every Tauto MCS. 
To compute the autocorrelation time for a given observable O, the time-dependent 
autocorrelation function needs to be measured: 

r mo)0{to + t)}-{0{to)){0{to+t)) 

In general, Co{t) ~ exp(— t/Tauto) and so Tauto is given by the value where Co drops 
to 1/e. An alternative is the integrated autocorrelation time t]^^^^. It is basically the 
same as the standard autocorrelation time for any practical purpose. However, it is 
easier to compute: 

,int i:zAm^)o{t.+t))-{o?) 

"auto {O"^) — (O)^ 

Autocorrelation effects influence the determination of the error of statistical estimates. 
It can be shown [28] that the error AO is given by 



AO = y ^^^^-:^(1 + 2rauto) . (30) 

Here M is the number of measurements. The autocorrelation time directly influences 
the calculation of the error bars and must be computed and included in all calcula- 
tions. So far, we have not discussed how the autocorrelation times depend on the 
system size and the temperature. Like the equilibration times, the autocorrelation 
times increase with increasing system size. 
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4.4 Critical slowing down and the Wolff cluster algorithm 

Close to a phase transition, the autocorrelation time is given by 

Tauto - e (31) 

with z > 1 and typically around 2. Because the correlation length ^ diverges at a 
continuous phase transition, so does the autocorrelation time. This effect, known as 
critical slowing down, slows simulations to intractable times close to continuous phase 
transitions when the dynamical critical exponent z is large. 

The problem can be alleviated by using Monte Carlo methods which, while only 
performing small changes to the energy of the system (to ensure that moves are 
accepted frequently), heavily randomize the spin configurations and not only change 
the value of one spin. This ensures that phase space is sampled evenly. Typical 
examples are cluster algorithms |70 [ l78 j where a carefully-built cluster of spins is 
flipped at each step of the simulation [25 | l47 ll48l[57] . 

Wolff cluster algorithm (Ising spins) In the Wolff cluster algorithm [78] we 
choose a random spin and build a cluster around it (the algorithm is constructed 
in such a way that larger clusters are preferred). Once the cluster is constructed, 
it is flipped in a rejection-free move. This "randomizes" the system efficiently, thus 
overcoming critical slowing down. Outline of the algorithm: 

□ Select a random spin. 

□ If a neighboring spin is parallel to the initial spin, add it to the cluster with a 
probability 1 - exp(-2 J/fcT). 

□ Repeat the previous step for all neighbors of the newly-added spins and iterate 
until no new spins can be added. 

□ Flip all spins in the cluster. 

The algorithm obeys detailed balance. Furthermore, one can show that the linear size 
of the cluster is proportional to the correlation length. Therefore the algorithm adapts 
to the behavior of the system at criticality resulting in 2: « 0, i.e., the critical slowing 
down encountered around the transition is removed and the algorithm performs orders 
of magnitude faster than simple Monte Carlo. For low temperatures, the cluster 
algorithm merely "flip-flops" almost all spins of the system and provides not much 
improvement, unless a domain wall is stuck in the system. For temperatures much 
higher than the critical temperature the size of the clusters is of the order of one spin 
and there the Metropolis algorithm outperforms the cluster algorithm (keep in mind 
that building the cluster takes many operations). Thus the method works best at 
criticality. 

In general, to be able to cover a temperature range that extends beyond the critical 
region, combinations of cluster updates and local updated (standard Monte Carlo) are 
recommended. One can also define improved estimators to measure observables with 
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a reduced statistical error. Finally, the Wolff cluster algorithm can also be generalized 
to Potts spins, XY and Heisenberg spins, as well as hard disks. The reader is referred 
to the literature [lillSlll71lli[S7] for details. 

Note that the Swendsen-Wang cluster algorithm [70] is similar to the Wolff cluster 
algorithm. However, instead of building one cluster, multiple clusters are built. This 
is less efficient when the space dimension is larger than two because in that case only 
few large clusters will exist. 

4.5 When does simple Monte Carlo fail? 

Metropolis et al. did not bear in mind that there are systems where even a simple 
spin flip can produce a huge change in the energy AH of the system. This has the 
effect that the probability for new configurations to be accepted is very small and the 
simulation stalls, in particular when the studied system has a rough energy landscape, 
i.e., different states in phase space are separated by large energy "mountains" and 
deep energy "valleys," as depicted in Fig. (9] Examples of such complex systems are 
spin glasses, proteins and neural networks. 

Figure 9: Sketch of a rough en- 
ergy landscape. A Monte Carlo 
move from the initial (solid cir- 
cle) to the final state (open circle) 
is unlikely if the size of the bar- 
rier AE is large, especially at low 
temperatures. A simple Monte 
Carlo simulation will stall and 
the system will be stuck in the 
met ast able state. 

These systems are characterized by a complex energy landscape with deep valleys 
and mountains that grow exponentially with the system size. Therefore, for low tem- 
peratures, equilibration times of simple Monte Carlo methods diverge. Although the 
method technically still works, the time it takes to equilibrate even the smallest sys- 
tems becomes impractical. Improved sampling techniques for rough energy landscapes 
need to be implemented. 

5 Complex toy model: The Ising spin glass 

What happens if we take the ferromagnetic Ising model and flip the sign of randomly- 
selected interactions between two spins? The resulting behavior is illustrated in 
Fig. [ini For low temperatures, if the product of the interactions Jij around any 
plaquette is negative, frustration effects emerge. The spin in the lower left corner 
of the highlighted plaquette tries to minimize the energy by either aligning with 
the right neighbor, or being antiparallel with the top neighbor. Both conditions are 
mutually exclusive and so the energy cannot be uniquely minimized. This behavior 
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is a hallmark of spin glasses [T0 lfT5l[T7l l22 l [54 l [71 ] l80] . Note that, in general, the bonds 
are either chosen from a bimodal (Vb) or Gaussian (Vg) disorder distribution: 

n{J^J) = p5{.h, - 1) + (1 - P)<^( + 1) , VM,) = exp[-4/2] , (32) 

where, in general p = 1/2. The Hamiltonian in Eq. ^ with disorder in the bonds 
is known as the Edwards- Anderson Ising spin glass. There is a finite-temperature 
transition for space dimensions c? > 3 between a spin-glass and the (thermally) disor- 
dered state, cf. Sec. 15.21 For example, for Gaussian disorder in three space dimensions 

Tc « 0.95 p;. 




Figure 10: Two-dimensional Ising spin-glass. The circles represent Ising spins. 
A thin line between two spins i and j corresponds to Jij < 0, whereas a thick 
line corresponds to Jij > 0. In comparison to a ferromagnet, the behavior of the 
model system changes drastically, as illustrated in the highlighted plaquette. For 
T — ^ 0, the spin in the lower left corner is unable to fulfill the interactions with the 
neighbors and is frustrated (see text). 



The result of competing interactions is a complex energy landscape. The com- 
plexity of the model increases considerably. For example, finding the ground state 
energy of a spin glass is generally an NP-hard problem. Equilibration times in finite- 
temperature Monte Carlo simulations grow exponentially and thus the study of system 
sizes beyond a few spins becomes intractable. So . . . why study these systems? Not 
only are there many materials [TU] that can be described well with spin-glass Hamilto- 
nians, many other problems spanning several fields of science can be either described 
directly by spin-glass Hamiltonians or mapped onto these. Therefore these models 
are of general interest to a broad spectrum of disciplines. 

Note that, because in general only finite system sizes can be simulated, an average 
over different realizations of the disorder needs to be performed in addition to the 
thermal averages. This means that after a Monte Carlo simulation has been completed 
for a given distribution of the disorder, it must be repeated at least 1000 times for the 
results to be representative. Although this extra effort further complicates simulations 
of spin-glass systems, it makes them embarrassingly parallel, i.e., simulations can 
easily be distributed over many workstations. 
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5.1 Selected hallmark properties of spin glasses 

Because of the complex energy landscape, spin glasses show dynamical properties not 
seen in any other materials/systems. First, spin-glass observables such as suscepti- 
bilities and magnetizations age with time. Due to the complex energy landscape, 
there are rearrangements of the spins in macroscopic time scales. Therefore, when 
preparing a spin-glass system at a given temperature, a slow decay of observables can 
be observed because the system, at least experimentally, is never in thermal equilib- 
rium. Furthermore, when performing an aging experiment on a spin glass, changing 
the temperature from Ti < Tc to T2 < Ti at time ti for a finite period of time ^2 
and then back to Ti shows interesting memory effects [37]. After the time ti+t2, the 
system remembers the state it had at time ti and follows the previous aging path. 
This memory and rejuvenation effect is unique to spin glasses. 

While the susceptibility shows a cusp at the transition [12], the specific heat has 
a smooth behavior around the transition temperature [54]. Furthermore, no signs 
of spatial ordering can be found when performing a neutron scattering experiment 
probing below the transition temperature. However, Mossbauer spectroscopy shows 
that the magnetic moments are frozen in space, thus indicating that the system is in 
a glassy and not liquid phase. Therefore, in its simplest interpretation, a spin glass 
is a model for a highly-disordered magnet. 

5.2 Theoretical description 

In 1975, Edwards and Anderson suggested a phenomenological model in order to 
describe these fascinating materials: the Edwards- Anderson (EA) spin-glass Hamil- 
tonian [19] discussed above. In 1979, Parisi postulated a solution (only recently 
proven to be correct [72]) to the mean-field Sherrington-Kirkpatrick (SK) model [66] . 
a variation of the Edwards- Anderson model with infinite-range interactions (all spins 
interact with each other). The replica symmetry breaking picture (RSB) of Parisi 
for the mean-field model spawned an increased interest in the field and has been ap- 
plied to a variety of problems. In addition, in 1986 a phenomenological description, 
called the "Droplet Picture" (DP) was introduced simultaneously by Fisher & Huse 
and Bray & Moore [SDIIII] in order to describe short-range spin glasses, as well as 
the chaotic pairs picture by Newman & Stein [SSKSS] However, rigorous analytical 
results are difficult to obtain for realistic short-range spin-glass models. Because of 
this, research has shifted to intense numerical studies. 

Nevertheless, spin glasses are far from being understood. The memory effect in 
spin glasses [37] has yet to be understood theoretically, and only recently was it 
observed numerically [36] . The existence of a spin-glass phase in a finite magnetic 
field [Mj has been a source of debate [81], as well as the ultrametric structure of 
the phase space (hierarchical structure of states) which remains to be understood 
for realistic models [30l[38]. Finally, there have been several numerical attempts at 
finite [32 ] l40 l l46 | l 5 1 1 [59] and zero temperature [27l[29] to better understand the nature 
of the spin-glass state for short-range spin glasses. To date, the data for Ising spins 
are consistent with an intermediate picture |461l59j that combines elements from the 
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standard theoretical predictions of replica symmetry breaking and the droplet theory. 

How can order be quantified in a system that intrinsically does not have visible 
spatial order? For this we need to first determine what differentiates a spin glass 
at temperatures above the critical point Tc and below. Above the transition, like 
for the regular Ising model, spins fluctuate and any given snapshot yields a random 
configuration. Therefore, comparing a snapshot at time t and time t + 5t yields 
completely different results. Below the transition, (replica) symmetry is broken and 
configurations freeze into place. Therefore, comparing a snapshot of the system at 
time t and time t + 5t shows significant similarities. A natural choice thus is to define 
an overlap function q which compares two copies of the system with the same disorder. 

In simulations, it is less practical to compare two snapshots of the system at 
different times. Therefore, for practical reasons two copies (called "replicas") a and /? 
with the same disorder but different initial conditions and Markov chains are simulated 
in parallel. The order parameter is then given by 

'^=i^E^"^f' (33) 

i 

and is illustrated graphically in Fig.[TTJ For temperatures below q tends to unity 
whereas for T > Tc on average — >■ 0, similar to the magnetization for the Ising 
ferromagnet. Analogous to the ferromagnetic case, we can define a Binder ratio g by 
replacing the magnetization m with the spin overlap q to probe for the existence of a 
spin-glass state. 




a p a p 

Figure 11: Graphical representation of the order parameter function q. Two 
replicas of the system a and /3 with the same disorder are compared spin-by-spin. 
The left set corresponds to a temperature T <giTc where many spins agree and so 
g — > 1 (in the depicted example q — 0.918). The right set corresponds to T > Tc; 
the spins fluctuate due to thermal fluctuations and so g < 1 (here q = 0.408). 



6 Parallel tempering Monte Carlo 

As illustrated with the case of spin glasses in Sec. [Sj the free energy landscape of 
many-body systems with competing phases or interactions is generally characterized 
by many local minima that are separated by free-energy barriers. The simulation 
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of these systems with standard Monte Carlo [44l|48l[53] or molecular dynamics [23] 
methods is slowed down by long relaxation times due to the suppression of tunnel- 
ing through these barriers. Already simple chemical reactions with latent heat, i.e., 
first-order phase transitions, present huge numerical challenges that are not present 
for systems which undergo second-order phase transitions where improved updating 
techniques, such as cluster algorithms ^70,78], can be used. For complex systems with 
competing interactions, one instead attempts to improve the local updating technique 
by introducing artificial statistical ensembles such that tunneling times through bar- 
riers are reduced and autocorrelation effects minimized. 



One such method is parallel tempering Monte Carlo |5|24ll34|50ll69] that has proven 

to be a versatile "workhorse" in many fields [T^. Similar to replica Monte Carlo 
[69] . simulated tempering [50], or extended ensemble methods 09], the algorithm 
aims to overcome free-energy barriers in the free energy landscape by simulating 
several copies of the target system at different temperatures. The system can thus 
escape metastable states when wandering to higher temperatures and relax to lower 
temperatures again in time scales several orders of magnitude smaller than for a simple 
Monte Carlo simulation at one fixed temperature. The method has also been combined 
with several other algorithms such as genetic algorithms and related optimization 
methods, molecular dynamics, cluster algorithms and quantum Monte Carlo. 

6.1 Outline of the algorithm 

M noninteracting copies of the system are simulated in parallel at different temper- 
atures {Ti,T2, . . . , Tm}- After a fixed number of Monte Carlo sweeps (generally one 
lattice sweep) two copies at neighboring temperatures Ti and T^+i are exchanged with 
a Monte Carlo like move and accepted with a probability 



r[{E„Ti) (£;,+!, T,+i)] = min{l,exp[(£;,+i - E,){l/T,+i - f/T,)]} . (34) 



A given configuration will thus perform a random walk in temperature space over- 
coming free energy barriers by wandering to high temperatures where equilibration 
is rapid and configurations change more rapidly, and returning to low temperatures 
where relaxation times can be long. Unlike for simple Monte Carlo, the system can 
efficiently explore the complex energy landscape. Note that the update probability in 
Eq. obeys detailed balance. 

At first sight it might seem wasteful to simulate a system at multiple tempera- 
tures. In most cases, the number of temperatures does not exceed 100 values, yet the 
speedup attained can be 5 - 6 orders of magnitude. Furthermore, one often needs the 
temperature dependence of a given observable and so the method delivers data for 
different temperatures in one run. A simple implementation of the parallel tempering 
move called after a certain number of lattice sweeps using pseudo code is shown below. 
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1 algorithm parallel_tempering(*energy , *temp, *spins) 



2 

3 for(counter = 1 ... (nuin_temps - 1)) do 

4 delta = (l/temp[i] - 1/temp [i+1] )* (energy [i] - energy [i+1]) 

5 if(rand(0,l) < exp(delta)) then 

6 swapCspins [i] , spins [i+1] ) 

7 swap(energy[i] .energy [i+1] ) 

8 fi 

9 done 



The swap( ) function swaps neighboring energies and spin configurations (*spins) 
if the move is accepted. As simple as the algorithm is, some fine tuning has to be 
performed for it to operate optimally. 

6.2 Selecting the temperatures 

There are many recipes on how to ideally select the position of the temperatures for 
parallel tempering Monte Carlo to perform optimally. Clearly, when the temperatures 
are too far apart, the energy distributions at the individual temperatures will not 
overlap enough and many moves will be rejected. The result is thus M independent 
simple Monte Carlo simulations run in parallel with no speed increase of any sort. If 
the temperatures are too close, CPU time is wasted. 

A measure for the efficiency of a system copy to traverse the temperature space is 
the probability (as a function of temperature) that a swap is accepted. A good rule of 
thumb is to ensure that the acceptance probabilities are approximately independent 
of temperature, between approximately 20 ~ 80%, and do not show large fluctuations 
as these would signify the breaking-up of the random walk into segments of the tem- 
perature space. Following the aforementioned recipe, parallel tempering Monte Carlo 
already outperforms any simple sampling Monte Carlo method in a rough energy 
landscape. Still, the performance can be further increased, as outlined below. 

Traditional approaches As mentioned before, a reasonable performance of the 
algorithm can be obtained when the acceptance probabilities are approximately inde- 
pendent of temperature. If the specific heat of a system is not strongly divergent at 
the phase transition — as it is the case with spin glasses — a good starting point is given 
by a geometric progression of temperatures. Given a temperature range [Ti, Tm], the 
intermediate M — 2 temperatures can be computed via 

Tfe = Ti n i?„ = "-{[^ . (35) 

r= 1 V ^1 

Because relaxation is slower for lower temperatures, the geometric progression peaks 
the number of temperatures close to Ti. If, however, the specific heat of the system 
has a strong divergence, this approach is not optimal. 

One can show that the acceptance probabilities are inversely correlated to the 
functional behavior of the specific heat per spin cy via ATi^^+i ^ cyTi/\/N j60j . 
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Therefore, if cy diverges, the acceptance probabilities for a geometric temperature 
set show a pronounced dip at the transition temperature. More complex methods 
such as the approach by Kofke [I^IUS], its improvement by Rathore et al. [51], as 
well as the method suggested by Predescu et al. [60l[61] aim to obtain acceptance 
probabilities for the parallel tempering moves that are independent of temperature 
by compensating for the effects of the specific heat. 

Finally, the number of temperatures needed can be estimated via the behavior 
of the specific heat. One can show that M ~ V N^~'^^/'^ [M]. Here d is the space 
dimension, N the number of spins, v the critical exponent of the correlation length 
and a the critical exponent of the specific heat. 

In practice, it is straightforward to tune a temperature set produced initially via 
a geometric progression by adding interstitial temperatures where the acceptance 
rates are low. A quick simulation for only a few Monte Carlo sweeps yields enough 
information about the acceptance probabilities to tune the temperature set by hand 
without having to resort to a full equilibrium simulation. 

Improved approaches Recently, a new iterative feedback method has been in- 
troduced to optimize the position of the temperatures in parallel tempering simula- 
tions [41]. The idea is to treat the set of temperatures as an ensemble and thus use 
ensemble optimization methods |74j to improve the round-trip times of a given system 
copy in temperature space. Unlike the conventional approaches, resources are allo- 
cated to the bottlenecks of the simulation, i.e., phase transitions and ground states 
where relaxation is slow. As a consequence, acceptance probabilities are temperature- 
dependent because more temperatures are allocated to the bottlenecks. The approach 
requires one to gather enough round-trip data for the temperature sets to converge 
and thus is not always practical. For details on the implementation, see Refs. [H] 
and [73], as well as Ref. [26] for an improved version. 

A similar approach to optimize the efficiency of parallel tempering has recently 
been introduced by Bittner et al. [111. Unlike the previously- mentioned feedback 
method, this approach leaves the position of the temperatures untouched but with an 
average acceptance probability of 50%. To deal with free-energy barriers in the simu- 
lation, the autocorrelation times of the simulation without parallel tempering have to 
be measured ahead of time. The number of MCS between parallel tempering updates 
is then dependent on the autocorrelation times, i.e., close to a phase transition, more 
MCS between parallel tempering moves are performed. Again, the method is thus 
optimized because resources are reallocated to where they are needed most. Unfortu- 
nately, this approach also requires a simulation to be done ahead of time to estimate 
the autocorrelation times, but a rough estimate is sufficient. 

6.3 Example: Application to spin glasses 

To illustrate the advantages of parallel tempering over simple Monte Carlo, we show 
data for a three-dimensional Ising spin glass with Normal-distributed disorder. In 
that case, one can use an exact relationship between the energy and a fourth-order 
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spin correlator known as the link overlap qi J40!. The link overlap is given by 

■ (36) 

The sum in Eq. (p6|) is over neighboring spin pairs and the normalization is over all 
bonds. If a domain of spins in a spin glass is flipped, the link overlap measures the 
average length of the boundary of the domain. 
The internal energy per spin u is given by 

where (• • • ) represents the Monte Carlo average for a given set of bonds, and [• • • ]av 
denotes an average over the (Gaussian) disorder. One can perform an integration by 
parts over Jij to relate u to the average link overlap defined in Eq. (|36|) . i.e., 

Tv 

[{qe)U = 1 + — ■ (38) 

The simulation starts with a random spin configuration. This means that the two 
sides of Eq. (p8| approach equilibrium from opposite directions. Data for qi will be 
too small because we started from a random configuration, whereas the initial energy 
will not be as negative as in thermal equilibrium. Once both sides of Eq. ((38l) agree, 
the system is in thermal equilibrium. This is illustrated in Fig. [12] for the Edwards- 
Anderson Ising spin glass with 4'^ spins and T — 0.5 which is approximately 50% Tc. 
The data are averaged over 5000 realizations of the disorder. While the data for 
generated with parallel tempering Monte Carlo agree after approximately 300 MCS, 
the data produced with simple Monte Carlo have not reached equilibrium even after 
10^ MCS, thus illustrating the power of parallel tempering for systems with a rough 
energy landscape. 

7 Other Monte Carlo methods 

In addition to the Monte Carlo methods outlined, there is a vast selection of other 
approaches based on random Monte Carlo sampling to study physical systems. In this 
section some selected finite-temperature Monte Carlo methods are briefly outlined. 
The reader is referred to the literature for details. Note that most algorithms can be 
combined for better performance. For example, one could combine parallel tempering 
with a cluster algorithm to speed up simulations both around and far below the 
transition. 

Cluster algorithms In addition to the Wolff cluster algorithm [78] outlined in 
Sec. 14.41 the Swendsen-Wang [TOj algorithm also greatly helps to overcome critical 
slowing down of simulations close to phase transitions. There are also specially-crafted 
cluster algorithms for spin glasses [31] . 
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Figure 12: Equilibration test for spin 
glasses with Gaussian disorder. Data for 
the link overlap (circles) have to equate to 
data for the link overlap computed from the 
energy (squares). This is the case after ap- 
proximately 300 MCS when parallel tem- 
pering is used. A direct calculation of the 
link overlap using simple Monte Carlo (tri- 
angles) is not equilibrated after 10^ MCS. 
Data for L — 4, d = 3, 5000 samples, and 
r = 0.50. 



O.B - 



3D EA spin glass 




Simulated annealing Simulated annealing is probably the simplest heuristic 
ground-state search approach. A Monte Carlo simulation is performed until the sys- 
tem is in thermal equilibrium. Subsequently, the temperature is quenched according 
to a pre-defined protocol until T close to zero is reached. After each quench, the 
system is equilibrated with simple Monte Carlo. The system should converge to the 
ground state, although there is no guarantee that the system will not be stuck in a 
metastable state. 



Flat-histogram methods Flat-histogram algorithms include the multicanonical 
method broad histograms (16_ and transition matrix Monte Carlo [77 when 

combined with entropic sampling, as well as the adaptive algorithm of Wang and 
Landau [751[7B] and its improvement by Trebst et al. [TJ]. The advantage of these 
algorithms is that they allow for an estimate of the free energy; this is usually not 
available from standard Monte Carlo methods. 



Quantum Monte Carlo In addition to the aforementioned Monte Carlo methods 
that treat classical problems, quantum extensions such as variational Monte Carlo, 
path integral Monte Carlo, etc. have been developed for quantum systems [68j . 
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